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Abstract 

We show here that the problem of maximizing a family of quantitative functions, en- 
compassing both the modularity (Q- measure) and modularity density (/^-measure), for 
community detection can be uniformly understood as a combinatoric optimization in- 
volving the trace of a matrix called modularity Laplacian. Instead of using traditional 
spectral relaxation, we apply additional nonnegative constraint into this graph cluster- 
ing problem and design efficient algorithms to optimize the new objective. With the 
explicit nonnegative constraint, our solutions are very close to the ideal community 
indicator matrix and can directly assign nodes into communities. The near-orthogonal 
columns of the solution can be reformulated as the posterior probability of correspond- 
ing node belonging to each community. Therefore, the proposed method can be ex- 
ploited to identify the fuzzy or overlapping communities and thus facilitates the un- 
derstanding of the intrinsic structure of networks. Experimental results show that our 
new algorithm consistently, sometimes significantly, outperforms the traditional spec- 
tral relaxation approaches. 

Keywords: Modularity maximization; community detection; nonnegative relaxation; 
fuzzy communities; overlapping communities 
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1 Introduction 



Networks have become a popular tool for describing complex real-world systems. Typ- 
ical examples include the Internet, metabolic networks, food webs, neural networks, 
technological networks, communication and distribution networks, and social networks 
(see [T] and reference therein for more examples). It has been observed that methods for 
associating a suitable community structure to a given network can support our analysis. 
Adopting a rather coarse-grained top-down point of view, one can elucidate a network's 
intrinsic structure and reveal its global or overall organization in terms of its modules 
or communities, i.e., suitably chosen collections of disjoint groups of nodes such that 
the nodes within each group are joined together in a tightly-knit fashion, while only 
looser connections exist between the nodes in different groups [6llT^. 

This description is just an intuitive concept rather than a rigorous definition, and 
thus is far way from meeting the requirement for detecting communities in graphs via 
computational algorithms. A first quantitative measure for evaluating the "goodness of 
fit" of a partition to a graph, the modularity function Q, was proposed by Newman and 
Girvan in [15j. Using this or similar such functions, community detection gets reduced 
to optimization problems, which has led to the emergence of a considerable number 
of community-detection procedures (see for recent reviews). The modularity 

function Q, however, has been shown to have limited resolution, i.e., communities 
smaller than a certain scale may not be revealed by optimization of Q even in the 
extreme case that they are cliques connected by single bridges [3]. To address this 
issue, another measure, namely the modularity- density or Z)-measure [11] was recently 
proposed. As suggested by theoretical analysis as well as by numerical tests, optimizing 
this new measure does not seem to exhibit the resolution limit of the Q-measure. 

Unfortunately, these two optimization problems are both NP-hard pT| [T5]. Inspired 
by the idea of spectral graph clustering. White and Smith [20] showed that maxi- 
mizing the (original) Q-function can be reformulated as a spectral relaxation prob- 
lem, i.e., an eigenvector problem involving a matrix called the Q-Laplacian, implying 
that modularity-based clustering can be understood as a special instance of spectral 
clustering — see also [^ where it was shown that this spectral-clustering approach can 
be extended quite naturally to also taking account of edge-weight data. The main 
disadvantage of such spectral relaxation approach is that the eigenvectors have mixed- 
sign entries, which could severely deviate from the true clustering indicator vectors 
(that these eigenvectors approximate). Therefore, we often require resorting to other 
clustering methods, such as K-means, to obtain final cluster results [71 [20]. 

To go beyond previous works, we propose here a new method to optimize the mod- 
ularity functions with additional nonnegative constraint. With the explicit nonnegative 
constraint, the solutions are very close to the ideal community indicator matrix and can 
be directly used to assign cluster labels to nodes. We give efficient algorithms to solve 
this problem with the nonnegative constraint rigorously. Experimental results show 
that our algorithm always converges and the performance is significantly improved in 
comparison with the traditional spectral relaxation approach. 

The rest of this paper is organized as follows. Section [2] shows that the problem of 
maximizing a family of modularity functions, encompassing both Q-measure and D- 
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measure, for community detection can be reformulated as a combinatoric optimization 
involving the trace of a matrix called modularity Laplacian. Our proposed nonnegative 
relaxation approaches are introduced in Section [31 We emphasize the soft clustering 
nature of this method, which facilitates identifying fuzzy communities of networks in 
Section m Experimental results on synthetic as well as real- world networks are reported 
in Section O Finally, we conclude our work in Section O 



2 Problem reformulation 
2.1 Modularity functions 

Our start point is a family of quantitative modularity functions. Let G = (V, E, W) 
be a network with n nodes, where V is the set of nodes, W = {w{u,v))(^u,v)£E is the 
weight matrix and w{u, v) is the weight for the edge connecting the nodes u and v. For 
any two subsets Vi,V2 of V, let L{Vi, V2) := ^^gy^ ^^V2 "^(^^ ^) denote the sum over the 
weights of all edges connecting a vertex u E Vi and a vertex f G V2. Given a partition 
n of the network G, the modularity function Q-measure is defined as [15] 



K 



Q(n) := E 



k=l 



L{V,V) [l{V,V) 



This measure provides a means to determine whether or not a partition is good enough 
to decipher the community structures of a network. Generally, a larger Q value corre- 
sponds to a better community structure identification. As reported in [3], Q- measure 
has been exposed to resolution limit since the size of a detected module depends on 
the size of the whole network. Recently, Li and his collaborators [llj proposed a quan- 
titative measure called modularity density D-measure to resolve this difficulty. For a 
specific network partition 11, the modularity density D[IV) is given by the sum 



K 



g(n):^^^"'"'''^''f"""^" (2) 



k=l 



Obviously, it can be referred to as a combination of the ratio association (the first term) 
and the ratio cut (the second term) and therefore be naturally extended to a family of 
quantitative functions 



K 



k=l 



where < A < 1 is the trade-off parameter. When A = 1, Dx is the ratio cut; when 
A = 0, Dx is the ratio association; when A = 0.5, Dx is the modularity density D. 

Let :s.k, k = 1, . . . , K he the community indicators where the u-th element of is 1 
if the node u belongs to community k, and otherwise. For example, if nodes within 

each community are adjacent, then = (0, . . . , 0, 1, . . . , 1, 0, . . . , 0). If we further define 
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Q = (qi, . . . , qx) as = the previous works (TUSO] show that maximization of 

modularity function can be reformulated as follows 

maxTrg^(W-r')Q s.t. Q^Q = I,Q>0 (4) 
Q 

where W := V := d = {di, . . . ,dnY such that component du equals the 

weighted degree of node u and volG = L(V, V) is the total number of edges in the 
network. Clearly, this is a combinatoric optimization, i.e., a matrix trace maximization 
with nonnegative and orthogonal constraints. We will demonstrate here that maximiz- 
ing the family of the quantitative functions ([3]) can be represented as the same type of 
problems. 

We can easily see that L(Vfc, Vk) = J2ueVk,ve% ^("' ^) = ^ki^-^)^k and L(T4, 14) = 
XfcVTxfc where D is the degree matrix of network G. Thus, the objective function can 
be rewritten as 

h ^^^^ 

ti ^^^^ 

which yields 

maxDA = Trg^(H^-AD)Q s.t. Q'^Q = I,Q>0 (6) 

Q 

Taken together, Eqs (jl]) and (E]) are the same type of optimization problem, i.e., a matrix 
trace maximization with nonnegative and orthogonal constraints which can uniformly 
represented as 

J„,^„ = maxTrg^(-M)Q s.t. Q^Q = I,Q>0 (7) 

Q 

The matrix M, corresponding to the modularity function used for investigation, plays a 
similar role to the combinatoric Laplacain matrix in graph theory (TJEO]. Analogously, 
we call it modularity Laplacian hereafter. 



2.2 Spectral embedding and clustering 

Note that there is a canonical one-to-one correspondence between subsets U G V and 
{0, l}-vectors x G J?" given by associating to each such vector x G 3?" its support 
supp(x) := {u G V : x(m) 7^ 0} that induces between a canonical one-to-one correspon- 
dence between partitions 11 of and collections X = {xi, . . . , x^^} that sum up to the 
"all-one vector" 1„ G 3?", i.e., the map 1^ : — )■ 3ft that maps every v E V onto 1. 
In consequence, the measure of a partition 11 of of rank K is bounded from ([7]) by 
the supremum sup(^gg^ e^(— M)e : £) where £ runs over all collections of K pairwise 
orthogonal unit vectors in 3?" and, hence (in view of the fact that this supremum co- 
incides with the sum of the K smallest eigenvalues of the symmetric matrix M — see 
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e.g., Section II in [7]), Jmmn > J2i=i -^j where Ai < A2 < . . . < A„ is the sequence, in 
ascending order, of the necessarily real eigenvalues of the symmetric matrix M. 

Thus, following the ideas presented in [3120], we can construct reasonably good 
community structures as follows. For each A; = 1, . . . , n, we may choose an eigenvector 
Bfc of M with eigenvalue A^ so that the ek{k = 1, . . . ,n) form an orthonormal family 
of eigenvectors of M and then consider, for each K = 1,2, ... ,n and the associated 
spectral embedding xk of V into 3fj{i'2= - >-ff} t\ig,t maps any v E V onto the point 

Xk{v) : {l,2,...,K} ^^-.k^ekiv) 

in 3fj{i:2,. -,^}_ jf ^]-^g were of the form = qj^ for some collection Xi, . . . of K 
(necessarily pairwise orthogonal) {0, l}-vectors in 3?" with ^^^x^ = 1„ associated to 
a X-partition 11, the support of any vector of the form Xk^v) for some v E V would be 
the one-element subset {k} with v G supp(xfc) = supp(efc), and two vectors of the form 
Xa'(m) and XKiv) {u,v G V) would coincide if and only if u and v would belong to the 
same subset U inU. 

3 Modularity function maximization with nonneg- 
ative relaxation 

From its definition, only one element is positive and others are zeros in each row of indi- 
cator matrix Q. Thus, we require a solution which optimizes a quadratic function of Q 
with two constraints: (1) orthonormal and (2) nonnegative. These difficult constraints 
should be relaxed to make the problem trackable. If we retain orthogonality while ig- 
nore the nonnegativity, the solution is the aforementioned spectral relaxation approach. 
The main disadvantage of such relaxation is that the eigenvectors have mixed-sign en- 
tries which may severely deviate from the true community indicator vectors. Therefore, 
most previous applications resort to a two-step procedure [TIED]: (1) embedding the 
network into the eigenvector space and (2) clustering these embedded points using other 
algorithm, such as K-means clustering. 

A more accurate relaxation is adding the nonnegative constraints on Q. One can 
see that when orthogonal and nonnegative constraints are satisfied simultaneously, the 
solution will very close to the ideal indicator matrix and thus can be used directly 
to assign community labels to nodes. This motivates our nonnegative relaxation ap- 
proach for the modularity function optimization. Formally, the optimization of Eq ([7]) 
is mathematically identical to 

maxTrQ^(pJ - M)Q s.t. Q'^Q = I,Q>0 (8) 

Q 

because the p term TiQ^pIQ = pTiI = pn is a. constant. In particular, we set p = 
^raax{M) to be the largest eigenvalue of M such that pJ — M is positive definite. This 
transformation makes the optimization a well-behaved problem. We begin with the 
Lagrangian function 

C = Jmmn(Q) - TrA(g^g - /) - TrSg (9) 
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where the Lagrange multipher A enforces the orthogonahty condition Q^Q = I and the 
Lagrange multipher E enforces the nonnegativity of Q > 0. The KKT complementary 
slackness condition becomes 



dC 



= [{pl - M)Q - Q^UQ^k = (10) 



dQik, 

Clearly, a fixed point also satisfies 

[{pi - M)Q - QAWl = (11) 

which is mathematically identical to Eq( ITO|) . Summing over k, we obtain An = [Q^{pl— 
M)Q]ii. This gives the diagonal elements of A. To find the off-diagonal elements of A, 
we temporarily ignore the nonnegativity requirement and set dL/dQ = which leads 
to Aiii = [Q^{pl — M)Q]iii. Combining these two results yields 

A = Q^{pl - M)Q (12) 

Decomposing M, A into positive part and negative part as 

M = M+ - M" A = A+ - A^ 

where M+ = (|M|+M)/2, A+ = (|A|+A)/2, M" = (|M|-M)/2 and A" = (|A|-A)/2, 
respectively. Now concentrating on the variable Q in £, we have 



1 a( J^^n - TrAQ^Q) 



pQ - M+g + M-Q - QA+ + QA 



2 dQ 

= {pQ + M-Q + QA~)-{M+Q + QA+) (13) 

As in Nonnegative Matrix Factorization (NMF) ^U\, Eq (fT3l) leads to the following 
multiplicative update formula: 



{pQ + M-Q + QA-U 

(M+Q + QA+),, ^^^^ 

We can see that using this update, Qik will increase when the corresponding element 
of the gradient in Eq (|T3l) is larger than zero, and will decrease otherwise. Therefore, 
the update direction is consistent to the update direction in the gradient ascent method. 
Note that the feasible domain of Eq ([7]) is non-convex, indicating that our algorithm 
can only reach local optimizations. However, we show by empirical study (in Section 
15. ip that our algorithm yields much better results than spectral clustering. In typical 
implementation of modularity function maximization with nonnegative relaxation algo- 
rithm, the computational complexity is 0{n'^K), which is similar to traditional spectral 
clustering approach. The correctness and convergence of the proposed algorithm are 
assumed by the following two theorems 

Theorem 1 Fixed points of Eq [I^ satisfy the KKT condition of the optimization 
problem of Eq 
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Theorem 2 Under the update rule of Eq l[TS\) . the Lagrangian function 



L = Tr[Q^(p/ - M)Q - A{Q^Q - I)] 



(15) 



increases monotonically. 

The proof of Theorem [1] could be straightforwardly implied in the derivation of the 
update rule (fT3|) and the proof of Theorem [2] is given in Appendix |Al As mentioned 
before, the solution is very close to the ideal class indicator matrix due to the orthonor- 
mal and nonnegative constraints. Thus Q can be directly used to assign cluster labels 
to data points. Specifically, the node u is assigned to community k* as 



4 Soft clustering and fuzzy community detection 

In fact, the rows of indicator Q are not exactly orthogonal since the off-diagonal elements 
of the Lagrangian multiplier are obtained by temporally ignoring the nonnegativity 
constraint. This slight deviation from rigourous orthogonality produces a benefit of 
soft clustering. An exact orthogonality implies that each row of Q can have only one 
nonzero element, which indicates that each node belongs to only one community. This 
is hard clustering, such as in K-means. The near-orthogonality condition relaxes this a 
bit, i.e., each node could belong fractionally to more than one community. This is soft 
clustering. 

An interesting phenomenon in community detection is that several nodes may not 
belong to a single community. Instead, assigning them into more than one group is 
more reasonable [T6l[T8]. Such nodes may mean a fuzzy categorization and thus take a 
special role, for example, signal transduction in biological networks. Another issue is 
that some nodes, considered as unstable vertices in [3], locate on the border between 
two communities are hard to classify into any group. Therefore, uncovering these 
nodes and overlapping communities can facilitate understanding the intrinsic structure 
of networks. 

We provide here a systematic analysis of the soft clustering of nodes based on 
the proposed algorithm. In previous related studies the rows of Q have been 

interpreted as posterior probability for graph partition. In other words, the magnitude 
of Quk quantifies the degree that the node u belongs to community k (see Section 15.21 
for illustrative instances). Specifically, after getting Q, we first row- normalize it to 
'Ylik Quk = 1 and then infer how strong the nodes belong to the community according 
to their community entropies to reveal the so-called unstable vertices. The community 
entropy of node u is defined as [7j 



k* = arg max Q^k 



k 



(16) 




(17) 



k 



Obviously, the nodes with large entropy must be less stable. 
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Out links z Out links z 

out out 

Figure 1: The evolution of two metrics with respect to Zout for testing two methods on 
the ad hoc networks with known community structures. Each point is an average over 
100 reahzations of the networks. The term SR denotes the spectral relaxation approach 
and the term NR denotes the nonnegative relaxation approach, respectively. 

5 Numerical results 

In this section, we extensively test the proposed algorithm on two artificial benchmark 
networks with a known community structure, including the ad hoc network with 128 
nodes and the LFR benchmarks. After that, the algorithm is applied to three fa- 
mous real-world networks: Zachary's karate-club network, journal citation network and 
American college football team network. 

5.1 Test on artificial networks 
5.1.1 Ad hoc benchmark netowrk 

The ad hoc network has 128 nodes, which are divided into 4 communities with 32 nodes 
each. Edges are placed randomly with two fixed expectation values so as to keep the 
average degree of a node to be 16 and the average edge connections Zout of each node 
to nodes of other modules. This experiment was designed by Girvan and Newman [6] 
and has been broadly used to test community-detection algorithms [5|I^[T5]. 

We use the following two standard evaluation metrics to evaluate the performance 
for our method and spectral relaxation approach. 

Accuracy is calculated by: 

Acc := Er^i^(^->"^^P(^-)) (18) 
n 

where k is the true cluster label and q is the obtained cluster label of i-th node, 6{x, y) 
is the delta function, and map{-) is the best mapping function. Note 6{x,y) = 1, if 
X = y; S{x,y) = 0, otherwise. The mapping function map(-) matches the true class 
label and the obtained cluster label and the best mapping is solved by Kuhn-Munkres 
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Table 1: Simulation results of our algorithm and spectral relaxation approach based on 
the general A-measure when Zout = 5, 6, 7, 8 respectively. Results are averaged by 100 
random re alizations. 

A ^ 



7 



SR 



Accuracy (%) 





0.2 
0.4 
0.6 
0.8 
1.0 



99.77±0.39 
99.82±0.35 
99.87±0.31 
99.86±0.36 
99.81±0.44 
99.30±1.60 



98.78±0.91 
98.84±0.93 
98.97±0.86 
98.95±0.90 
98.50±2.70 
95.66±5.71 



95.50±1.87 
95.88±1.69 
95.96±1.48 
95.52±1.78 
91.27±6.65 
79.82±11.70 



83.38±5.83 
84.50±4.93 
84.58±4.78 
81.38±5.24 
70.39±8.94 
58.63±9.24 



NR 





0.2 
0.4 
0.6 
0.8 
1.0 



99.81±0.35 
99.84±0.33 
99.86±0.32 
99.86±0.32 
99.85±0.33 
99.42±1.40 



98.97±0.84 
99.01±0.84 
99.06±0.88 
99.06±0.87 
98.78±2.25 
96.07±5.01 



96.38±1.58 
96.67±1.48 
96.66±1.47 
96.45±1.55 
92.22±8.73 
80.66±11.34 



87.30±6.03 
88.97±3.12 
88.77±3.93 
86.66±4.38 
71.14±11.32 
60.77±10.80 



SR 



NMI (%) 





0.2 
0.4 
0.6 
0.8 
1.0 



99.24±1.26 
99.42±1.12 
99.57±1.01 
99.55±1.15 
99.39±1.43 
98.05±3.72 



96.18±2.75 
96.37±2.85 
96.73±2.70 
96.68±2.79 
95.80±4.86 
89.85±9.60 



86.67±5.08 
87.68±4.75 
87.86±4.13 
86.81±4.62 
79.07±9.82 
62.40±13.20 



60.75±7.58 
62.29±7.37 
62.62±6.76 
57.38±6.79 
44.48±8.98 
30.89±9.18 



NR 





0.2 
0.4 
0.6 
0.8 
1.0 



99.39±1.14 
99.50±1.07 
99.55±1.03 
99.55±1.03 
99.52±1.05 
98.36±5.02 



96.74±2.63 
96.86±2.63 
97.03±2.78 
97.00±2.77 
96.55±3.54 
90.34±12.42 



89.04±4.45 
89.85±4.24 
89.85±4.23 
89.34±4.32 
83.51±10.64 
63.83±17.81 



68.44±7.70 
70.75±6.14 
70.56±7.31 
66.93±7.75 
48.85±10.46 
36.45±11.57 



algorithm [12]. A larger accuracy value indicates a better performance. 

Normalized mutual information(NMI): Given the sets of real communities and 
found communities S/, the mutual information metric MI(S,.,§/) is defined as 

MI(§„§;)= PK^)log2 4WT (19) 

where p{u),p{v) denote the probabilities that a node arbitrarily selected from the net- 
work belongs to the real community §r and found community S/, respectively, and 
p{u, v) denotes the joint probability that this arbitrarily selected node belongs to the 
communities §r and §/ at the same time. The NMl(Sr,S/) is defined as 

where H(S>r) and H(E>f) are the entropies of and S/ respectively. 

FiglD shows the evolution of Q-measure and D-measure with respect to Zout by our 
method (NR) and spectral relaxation approach (SR), respectively. From the results, 
we can observe whatever modularity function and evaluation metric was utilized, our 
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Figure 2: Test of our algorithm on the LFR benchmark graphs [8l|9] where the number 
of nodes n = 500 and average degree (d) = 15. Each point corresponds to an average 
over 50 reahzations. The term Q^^ denotes maximizing the Q-measure based on the 
spectral relaxation approach and similar for the other three terms. For the normalized 
mutual information, our method always outperforms the spectral relaxation approach, 
especially when /i is larger than the threshold /ic = 0.5 beyond which communities are 
no longer defined in the strong sense. 

nonnegative relaxation method outperforms traditional spectral relaxation approach 
on all benchmark networks, especially for the most difficult case when Zout = 8. Table 
[1] give the simulation results of these two algorithms based on the general A-measure 
when Zont = 5, 6, 7, 8 respectively. From the table, we see clearly that (1) our method 
consistently, sometimes significantly, yields more accurate clustering than spectral re- 
laxation approach. In particular, the results are significantly improved when Zo^t = 8 
(Table [T]); (2) both of these algorithms achieved best performance when the parameter 
A = 0.4. This is partly due to the fact that the artificial communities are of equal size 
and similar total degree |Tl]. In fact, how to choose an appropriate value for parameter 
A seems not a trivial issue. We omit it here due to out of the scope of this paper and 
suggest interested readers seeing pJL| for more discussions. 

5.1.2 The LFR benchmarks 

The LFR benchmark [2111] is a realistic graphs for community detection, which takes 
the heterogeneity of both node degree and community size into account. The node 
degree and community size distributions are both power law, with exponents 7 and 
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Figure 3: Test of our algorithm on the LFR benchmark graphs [8|l9] where the number of 
nodes n = 500 and average degree (d) = 20. Each point corresponds to an average over 
50 reahzations. For the normahzed mutual information, our method always outperforms 
the spectral relaxation approach, especially when fi is larger than the threshold /Xc = 0.5 
beyond which communities are no longer defined in the strong sense. 



P, respectively. The number of nodes is n, and the average degree is (k). In the 
construction of the benchmark networks, each node receives its degree once and for all 
and keeps it fixed until the end. It is more practical to choose as independent parameter 
the mixing parameter fi, which expresses the ratio between the external degree of a 
node with respect to its community and the total degree of the node. To compare the 
built-in modular structure with the one delivered by the algorithm, we adopt here the 
normalized mutual information (NMI), which has proved to be reliable [8l|9]. 

In Fig. [2}|U we show the results when we apply our algorithm to the LFR bench- 
mark problem with n = 500 and (d) = 15, 20, 25. The four panels correspond to the 
results with the four pairs of exponents (7, /3) = (2, 1), (2, 2), (3, 1), (3, 2), respectively. 
We have chosen combinations of the extremes of the exponent ranges in order to explore 
the widest spectrum of network structures. Each curve shows the variation of the nor- 
malized mutual information with the mixing parameter fi. Clearly, the results depend 
on all parameters of the benchmark, from the exponents 7 and /3 to the average degree 
(d). We can see that (1) the performance of our method is better when the average 
degree (d) becomes large, whereas it gets worse when the mixing parameter /i becomes 
larger; (2) For the normalized mutual information, our results are all above 0.9 when /x 
is less than the threshold /ic = 0.5 that marks the the border beyond which communities 
are no longer defined in the strong sense, i.e., such that each node has more neighbors in 
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Figure 4: Test of our algorithm on the LFR benchmark graphs [8|l9] where the number of 
nodes n = 500 and average degree (d) = 25. Each point corresponds to an average over 
50 reahzations. For the normahzed mutual information, our method always outperforms 
the spectral relaxation approach, especially when fi is larger than the threshold /Xc = 0.5 
beyond which communities are no longer defined in the strong sense. 



its own community than in the others; (3) whatever modularity function was used, our 
method always achieves the superior performance on all the LFR benchmark problems, 
especially for the most difficult case when the mixing parameter fi is large than /ic. 
These observations support the effectiveness of our algorithm. 



5.2 Applications to real world networks 
5.2.1 Zachary's karate-club network 

This famous network was constructed by Zachary after he observed social interactions 
between 34 members of a karate club at an American University for a period of two 
years plj. Soon after, a dispute arose between the clubs administrator and instructor 
and thus the club split into two smaller ones. The question concerned is if we can 
uncover the potential behavior of the network, detect the two communities, and in 
particular, identify which community a node belongs to. It has become a prototype to 
test the algorithms for finding community structure in networks [7l[TT|[T5]. 

We apply here our new method to this network based on D-measure. In |2T] , Zachary 
gave the partition §i = {1 : 8, 11 : 14, 17, 18, 20, 22} and §2 = {9, 10, 15, 16, 19, 21, 23 : 
34}. Thus, we first employ our method as a hard clustering algorithm (Eq. fll6p in 



11 



Figure 5: The partition of karate club network obtained by our new method based on 
D-measure. The gray-scale plot of pi and p2 for each node. The darker the color, the 
larger the value of pi. The transition nodes or intermediate nodes are clearly shown. 

Section |3]) and obtain Acc = 1.000 and NMI = 1.000 which is slightly better than 
that obtained by spectral relaxation approach Acc = 0.9706 and NMI = 0.8361. Figj5] 
gives the partition that divides the network into two groups of roughly equal size and 
produces a completely consistent split with the actual division of the original club. 
We color each node as black or white in the figure to show its attribute in the graph 
representation. From the viewpoint of the soft clustering, the attribute of each node is 
no longer an indicator function but rather a discrete probability distribution. In our 
following notations, the association probability pi or p2 indicates the probability of each 
node belonging to black or white colored group, respectively. 

We summarized the association probability and community entropy of each node in 
Tabled From Tabled we find pi = 1 for nodes {2, 4 : 8, 11 : 13, 17, 18, 22}, which lie at 
the boundary of the black colored group (Fig|5]), andp2 = 1 for nodes {15, 16, 19, 21, 23 : 
30,33}, which mostly lie at the boundary of the white colored group (except node 33 
that lies at the center of the white colored group). The nodes {3, 9, 10, 14, 20, 31} have 
more diffusive probability and higher community entropy value (Table [2]) and they play 
the role of transition nodes between the black and white colored groups. In other words, 
these nodes constitute the fuzzy boundaries of two communities (Figj5]). In particular, 
node 3 is the most diffusive node of those that are exactly in-between the two smaller 
clubs. This means that such members have good friendship with the two clubs at the 
same time. We can visualize the data p more transparently with the gray-scale plot for 
each node shown in FigJSl Also we can uncover more such nodes with a different degree 
of instability according to the sorted ^ values. 

5.2.2 Journal citation network 

The journal citation network, proposed in [l9j, consists of 40 journals as nodes from 
4 different fields: physics, chemistry, biology and ecology, and 189 links connecting 
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Table 2: The association probability and community entropy of each node, pi and 
P2 are the probabilities of belonging to the black or white colored groups in FigJSl 
respectively. Larger entropies mean that the corresponding nodes are more difficult to 



classify into communities. 
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Figure 6: (a) Community structure of the journal citation network extracted by our 
algorithm. The ellipse, squares, diamonds and triangles denote the split of the network 
into four groups that are perfectly consistent with the original partition. The different 
colours denote the partition of the network into two clusters, and the blue nodes are 
further subdivided into two smaller groups denoted by the different shades of vertices, 
(b) Soft clustering of the journal citation network. The blue, green and red bars illus- 
trate the association probability of each node when the network is partitioned into two, 
three, and four modules, respectively. 
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nodes if at least one article from one journal cites an article in the other journal dur- 
ing 2004. Ten journals with the highest impact factor in the 4 different fields were 
selected. With the D-measure at hand, we partition the network into 4 communi- 



ties as that obtained by spectral relaxation approach (see Fig 6(a)). This split of the 



network is consistent perfectly with its orignial partition, i.e., Acc = 1 and NMI 



1. The community entropy of each node is given by the red bar in Fig, 6(b) where 
the nodes {Physics_4, Chemistry_19, Biology _22, Biology _27} form the fuzzy boundary 
of these four communities. In fact, these nodes are densely linked to their neighbors in 
different communities simultaneously. Particularly, the node Physics_4 (Physical Re- 
view Letters), which was misclassified into chemistry cluster in p!9| and is correctly 
classified here, has 9 links to physics, 8 links to chemistry, 5 links to biology and only 
1 link to ecology. 

We also partition the network into two, three, or five modules as suggested by the 
previous studies ^lipiQj . When we split the network into two components, physical jour- 
nals group together with chemical journals, and biological journals cluster together with 
ecological journals. In this scenario, nodes {Physics_4, Chemistry_15, Chemistry_19} 



become the transition ones between these two communities (the blue bar in Fig, 6(b)). 
When we partition it into three clusters, ecological journals and biological journals sep- 
arate, but physical journals and chemical journals remain together in a single module. 
At this time, nodes {Physics_4, Chemistry_19, Biology _22, Biology _27} are the unstable 



nodes which has the largest entropy values (the green bar in Fig 6(b) ). When we further 
split the network into five communities, the partition is essentially the same as with 
four, only with the singly connected journal Ecology _38 (Conservation Biology) split 
off by itself as a community. All the results are consistent with that in [TTlfTg] . 

5.2.3 American college football team network 

The third real network is the college football team network of the United States. The 
schedule of Division I games can be represented by a network, in which the nodes denote 
the 115 teams and the edges represent 613 games played in the course of the year. These 
teams are divided into 12 conferences containing around 8-12 teams each. Games are 
more frequent between members of the same conference than they are between members 
of different conferences, with teams playing an average of about seven intra-conference 
games and four inter-conference games in the 2000 season. Inter-conference play is not 
uniformly distributed; teams that are geographically close to one another but belong 
to different conferences are more likely to play with one another than teams separated 
by large geographic distances. The natural community structure in the network makes 
it a commonly used workbench for community-detecting algorithm testing [6l[TT|[T5]. 
Using our algorithm, we can split the network into conferences with a high degree 



of success,i.e., Acc = 0.9130 and NMI = 0.9232. Fig 7(a) shows the community struc- 
ture of football team network calculated by our method. We further investigate the 10 
misclassified teams. 4 teams { "Connecticut " , "Navy " , "NotreDame " , "UtahState " } 
belong to the conference "Independents" (denoted as green-edge nodes) which means 
each member of this conference is not associated with any existing football conference 
and thus there are few edges between its 5 members distributed to other communities. 
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Football t( 



(a) (b) 

Figure 7: (a) Community structures of American college football team network. Nodes 
are colored according to their different conferences. The 5 white colored nodes have 
the highest community entropy values and are the most difficult to classify into com- 
munities, (b) The community entropy of each node. The larger value means that 
corresponding node is more difficult to classify into communities. 



but they tend to be clustered with the conference which they are most closely asso- 
ciated with. Specifically, "Connecticut" is distributed to community "Med- American" , 
"Navy " and "NotreDame " are classified into community "Big East " , and "UtahState 
"is assigned to community "Sun Belt". The remaining 6 teams, { "LouisianaMonroe 
", "MiddleTennesseeState ", "LouisianaLafayette "} associated with conference "Sun 
Belt" (diamond nodes) and { "BoiseState " , "Houston " } associated with conference 
"Western Athletic", are misclassified as "Independent ". This happens partly because 
the "Sunbelt" teams played nearly as many games against "Western Athletic "teams 
(triangle nodes) as they played in their own conference, and they also played quite a 
number of games against "Mid- American" teams (square nodes) [Ellll]. All these teams 
are incorrectly classified due to the fact that there are more games with the teams 
in the classified communities than there are with the teams in their own conferences. 
The community entropy of each team are shown in Fig j7(b) The nodes with top 5 
highest entropy value are teams { "Navy " , "NotreDame " , "Connecticut" , "TexasTech 
", "Texas "} (see the white colored nodes in Fig, 7(b)). Clearly, these teams are both 
play enough games with other ones belonging to various different conference. For exam- 
ple, the team "Navy" has with the highest entropy value, and it play game with other 
teams belonging to as many as 6 different conferences, i.e., "Big East" (three times), 
"Atlantic Coast" (twice), "Conference USA "(twice), "Mid- American "(once), "Moun- 
tain West" (once) and "Western Athletic "(once). All these results suggest that the 
community structure found by our method seems to reveal a more precise organization 
than the original conferences. 
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6 Concluding remarks 



In this paper, we first show that the problem of maximizing one-parameter A family of 
quantitative functions, encompassing both the modularity (Q-measure) and modular- 
ity density (D-mesure), for community detection can be understood as a combinatoric 
optimization involving the trace of a matrix called modularity-Laplacian. After that, 
we proposed the nonnegative relaxation to solve such optimization problem, and in- 
troduced efficient algorithms to solve this problem with explicit nonnegative constraint 
rigorously. Differing from the standard spectral relaxation approach, our solutions are 
very close to the ideal class indicator matrix and can directly assign nodes into commu- 
nities. In addition, we prove the convergence and correctness of our proposed method. 
Extensive experimental results on Newman artificial networks and LFR benchmark net- 
works show that the proposed algorithms always outperform, sometimes significantly, 
when compared with the traditional spectral clustering method. Furthermore, We em- 
phasize the soft clustering nature of this method due to its inherent near-orthogonality 
of columns which can be reformulated as the posterior probability of corresponding 
node belonging to each community. Simulation results on real world networks illustrate 
that our algorithm can be applied to fuzzy community detection and unstable node 
discovery, which can help to elucidate a network's intrinsic structure and facilitate our 
analysis. 



Acknowledgments 

The authors thank Mu Li for helping to generate the ad hoc artificial networks. Professor 
M.E.J. Newman for providing data of karate club network and American college football 
team network and useful comments, and Martin Rosvall for providing data of journal 
index network. 



A Proof of Theorem 2 



We use the auxiliary function approach [TU]. An auxiliary function G{H, H) of function 
L{H) satisfies G{H,H) = L{H), G{H,H) < L{H). We define 



= argmaxG(i7, if^) (21) 

Then by construction, we have 

L(i7W) = Z{H^'\H^'^) < Z(i7(*+i), //W) < L(/7(*+i)) (22) 
This proves that L{H^^'^) is monotonically increasing. We write Eq. ffTSj) as 

L = Tr[Q^(p/ + M-)Q + A'Q'^Q - Q^M+Q - A^Q^Q] 
We can show that one auxiliary function of L is 

Z{H,H) = ^(p + M-),,iJ,,^,,(l + log|^) 
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ilk ^iktLil 

ik ik 

using the inequality 

2; > 1 + log 2;, z = HikHjk/ HikHjk 

and a generic inequality 

J2 E > TriS^ASB) 

i=i p=i 

where A,B, S,S' > 0, A = A^, B = B'^ . We now find the global maxima of Z{H) = 
G{H,H). The gradient is 

dZ{H,H) ^ ^ [{M- + p)H],,H,, ^ ^ {HA-)uHik 
dHik Hii 

jM+H),kH^k _ jHA+),,H,k ^23) 
Hik Hik 



The second derivative 

d^GjH, H) 
dHikdHji 



-2WikSij6kiWik 



+ 



ik 



tt2 ' tj2 

^ik ^il 



+ \ 



I {M+H),k ^ {HA ■ ^24) 
Hik Hik 

is negative definite. Thus Z{H) is a concave function in H and has a unique global 
maximum. This maximum is obtained by setting the first derivative to zero, yielding: 

^2 ^ ^2 m-+p)H],^ + {HA-),k ^25) 
{M+H),k + {HA+),k 

According to Eq ([2l]), = H and = H, we see that Eq ([25]) is the update 

rule of Eq ([7]) . Thus Eq fl25|) always holds. 
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